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Abstract 



' Convergence properties of Shannon Entropy are studied. In the differential set- 

ting, it is shown that weak convergence of probabihty measures, or convergence in 
distribution, is not enough for convergence of the associated differential entropies. A 
J> ' general result for the desired differential entropy convergence is provided, taking into 

I account both compactly and uncompactly supported densities. Convergence of differ- 

I ential entropy is also characterized in terms of the Kullback-Liebler discriminant for 

densities with fairly general supports, and it is shown that convergence in variation of 
probability measures guarantees such convergence under an appropriate boundedness 
Q , condition on the densities involved. Results for the discrete setting are also provided, 

allowing for infinitely supported probability measures, by taking advantage of the 
^ , equivalence between weak convergence and convergence in variation in this setting. 
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1 Introduction 



Convergence of a sequence of probability measure entropies plays a key role in information 
theory, from both theoretical and applied points of view, often appearing linked to the 
problem of estimation of the entropy of a source [1-4]. 

As it is usual in information theory, the first order of business is to understand the 
problem in the context of discrete sources, and some of the convergence results can be 
found in today's standard textbooks of the area [5,6], and some recent works [7]. A more 
general approach can be found in the works of A. Barron, where a proof of the Central 
Limit Theorem based on entropy convergence [8] and the entropy convergence of stationary 
processes [9] are presented. The discussion of information topologies for general sources 
[10] touches tangentially the problem of convergence in a more general setting. 

However, the focus of many of these works has been on continuity rather than con- 
vergence properties of Shannon entropy. On the one hand, continuity properties embrace 
results guaranteeing convergence of entropy for all approximating sequences of probability 
measures converging, in a certain topology, to a given limiting probability measure. Em- 
phasis is put there in identifying the largest class of probability measures for which the 
corresponding convergence of entropy takes place for all approximating sequences. On the 
other hand, convergence properties are usually related to deciding whether convergence of 
entropy takes place for a given, fixed family of probability measures, also converging in a 
certain topology to a limiting probability measure. Whereas in the continuity context all re- 
quirements are imposed on the limiting probability measure, in order to ensure convergence 
of entropy for all possible approximating sequences, in the pure convergence context one 
can and should exploit any underlying structure of the particular approximating sequence 
at hand, as usually done in applied probability problems. 

The purpose of this paper is to present general conditions for the convergence of entropy 
sequences associated to both discrete and continuous sources, over possibly infinite or non- 
compactly supported alphabets, respectively. 

In the case of continuous sources, results of this kind can be used in applications where 
one is confronted with the problem of deciding whether the sequence of differential entropies 
associated with a family of probability densities {pn}'^^i on M^, each term of the sequence 
given by 




(1) 
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with dx denoting Lebesgue measure, converges as n increases to infinity to the respective 
differential entropy associated to the hmiting density of the family (assuming such limiting 
density exists in some appropriate sense). 

In general, only numerical computation of the sequence elements ([1]) is possible, making 
it difficult to conclude the desired convergence in an abstract sense. Such convergence 
must be established then by exploiting underlying properties or structures of the sequence 
{pn}'^=i by itself and its limit. 

If we assume pointwise convergence of the corresponding integrands, two main convergence- 
related results from real analysis are at our disposal: the monotone and dominated con- 
vergence theorems for Lebesgue integrals. On the one hand, the monotone convergence 
theorem provides no help for this problem given that if each p„ is a probability density 
function and, as such, satisfies the normalization condition 

Pndx = 1, 

then the monotonicity in the sequence {p„}^^ is only possible in the trivial case when 
all densities coincide for almost every x. On the other hand, the dominated convergence 
theorem requires the construction of a function / such that 

\Pnix)\0g[pnix)]\<fix), (2) 

for each n and x, and 

/ fdx < oo, (3) 
being in general such construction difficult to carry out. 

Though it is usually easier to check, rather than ([2]) and ([3]), whether the boundedness 
condition 

sup \pn{x)\ < OO 

holds, implying then M = sup^ ^, \pn{x) log[p„(x)]| < oo, such a condition is not enough for 
the application of the dominated convergence theorem in the case of densities supported 
over an infinite Lebesgue measure set, since / cannot be taken as the constant function 
M(> 0) in that case {J^Mdx = M J^dx = oo ii has infinite Lebesgue measure). We 
show, however, that appropriate absolute continuity properties of measures provide a suit- 
able boundedness condition that can be used, in conjunction with the dominated conver- 
gence theorem, to establish the desired convergence of the associated differential entropies. 
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and the Kullback-Liebler discriminant as well, for densities with fairly general supports. 
Our result holds independently of the non-compact, or even infinite Lebesgue measure 
nature of the supports involved. This is accomplished by exploiting the fact that for a 
density p on X C M'^, though Lebesgue measure in X may be infinite if X is unbounded, 
= J pdx is not. The value of the result lies on the fact that it does not require the 
construction of any additional function (such as / above), as it relies exclusively on the 
structure of the densities involved. We also show that convergence in distribution of the 
respective probability measures is not enough to have convergence of the corresponding 
differential entropies, which reinforces the importance of establishing general conditions for 
such convergence to take place. 

The paper also provides a characterization of convergence of differential entropies in 
terms of the Kullback-Liebler discriminant, for densities with fairly general supports too. 
Moreover, we show that under an appropriate boundedness condition on the densities 
involved, convergence in variation of probability measures does indeed guarantee the desired 
differential entropy convergence. 

In the discrete setting, the paper shows that convergence in distribution and in variation 
of probability measures are equivalent. In particular, if the probability measures have finite 
support then convergence of their respective entropies and the Kullback-Liebler discrimi- 
nant follow immediately. In the case of probability mass functions with infinite supports, 
we exploit the afore mentioned equivalence between weak convergence and convergence in 
variation to establish the convergence of entropies and the Kullback-Liebler discriminant. 

The organization of the paper is as follows. In Section [2] we introduce notational and 
terminological conventions used throughout the paper, as well as the necessary elements 
from the theory of convergence of probability measures. (Most of the definitions in this 
section apply to both the continuous and discrete case, when Lebesgue measure does not 
play a role.) Sections [3] and H] consider the case of continuous random variables. In Sec- 
tion [3] we show that convergence in distribution of the underlying probability measures 
is not enough to have convergence of the associated differential entropies, characterizing 
such convergence for densities with fairly general supports in terms of the Kullback-Liebler 
discriminant and showing that, under an appropriate boundedness condition on the densi- 
ties involved, convergence in variation of probability measures does guarantee the desired 
differential entropy convergence. In Section H] we provide a general result for convergence 
of differential entropy and Kullback-Liebler discriminant under a pointwise convergence 
condition, taking into account both compactly and uncompactly supported densities. In 
section [5] we deal with the discrete case. Finally, in Section [6] we present a summary of the 
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results and discuss on their scope. 



2 Preliminary Elements 

In this section we introduce the concepts (and related notation) upon which we elaborate 
the present work. Our presentation includes the notions of weak convergence, convergence 
in variation and a measure-theoretic definition of entropy of probability measures. 

2.1 Definitions 

Let k he a positive integer, M'^ the /c-dimensional Euclidian space endowed with the usual 

Euchdian metric || II2, and i3(]R'^) the collection of Borel sets in R^. Also, let X G B{R^), 

X closed, be a Polish subspace, i.e., X is separable (it has a countable dense subset) and 
complete (every Cauchy sequence in X converges to a point x G X) [11, 12]. We denote 
as AC{1C) the collection of all probability measures fi on (X, i3(X)) which are absolutely 
continuous with respect to (w.r.t.) the Lebesgue measure in X (denoted as dx), i.e., having 
the representation 



A e B(X), with ^ : X — > R+ = [0,oo), Borel measurable, the Radon- Nikodym derivative 
or density of /i w.r.t. dx. Of course, when considering ^C(X) we assume X is such that 
AC{X) ^ (i.e., X having strictly positive Lebesgue measure). In the same way, we denote 
as ^C+(X) the set of all [i G ^C(X) for which ^ > Lebesgue-almost everywhere on X. 
In particular, G ^C+(X) implies that and dx are mutually absolutely continuous or 
equivalent, and that 



with a G IR+ any constant value, provides indeed a valid expression for the Radon-Nikodym 
derivative ^ (since // G ^C+(X), the set {x G X : ^(x) = 0} is Lebesgue-nuU) . 

In addition, let / : X ^ M be a real-valued function. Its support is the closure of the 





(4) 
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set of all X e X where f{x) is strictly positive, i.e., 



support (/) = {x e X : f{x) > 0}, 



the overline {•} denoting closure. In particular, we have that the Lebesgue measure of the 
sets support (^) and X coincide when // e AC^{X). 

2.2 Convergence of probability measures 

We now collect some basic definitions and results, in the context needed for the next 
sections of the paper. Throughout, P(X) denotes the collection of all probability measures 
on (X, i3(X)) and C(X) (resp., Cb(X)) the space of all continuous (resp., bounded and 
continuous), real- valued functions on X. 

Definition 2.1. A sequence C 7^(X) is said to converge weakly to /i & 'P(X), 

denoted /in ^ /J' as n ^ oo, if 



as n ^ oo for each f e Cft(X). 

Since X is separable, weak convergence /x„ =^ as n t oo of {l^n}'^=i ^ 'P(X) to 

II e P(X) is equivalent to convergence //) ^ 0, as n t oo as well, with •) denoting 
the Prohorov metric on P(X) x P(X), i.e.. 



0-1,0-2 e ViX), where for A C X, e > and y e X, = {x e X : d{x,A) < e} and 
d{y, A) = inf{\\y — z\\2 : z E A}. Note A*^ is open in X, and hence A*^ e i3(X). In addition, 
since X is not just separable but Polish, (7'(X),p) is Polish too [13]. 

Weak convergence in P(R*^) is also equivalent to the standard convergence in distribu- 
tion. (Note cr e 7'(X) can always be looked at as an element of V{^^) by setting a{A) to 
a{A n X) for A e Indeed, for {Hn}n=i ^ 'P{R'') and e V{R''), we have Hn ^ li 

as n t oo if and only if , as n t oo as well. 




p{ai, (72) = inf{e > : C7i(^) < a2{A') + e. 



a2{A)<a,{A') + eyAeB{X)}, 



Fn{x) ^ F{x) 
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at each a; e M'^ point of continuity of F, where F„ and F denote the distribution functions 
associated to //„ and respectively, i.e., 



and 



for each x — {xi, . . . , Xk) £ K'^. In fact, the following result holds. 

Lemma 2.1. Let C P(X) and /i e P(X). VFe /lawe =^ ^ as n \ oo if and only 

if, as 11 \ oo as well, 



for each A e B(X) being a ^-continuity set, i.e., such that n{dA) — with dA denoting the 
boundary of A: dA = {xeX:xeA, A}. 



Another important way of convergence for probability measures, stronger than weak 
convergence, is the so-called convergence in variation associated with the distance in vari- 
ation between probability measures. 

Definition 2.2. The distance in variation between ai e P(X) and 02 £ ■P(X) is the real 
number — (72||v G [0, 2] given by 



where A1(X) denotes the collection of all W -valued, Borel measurable functions on X, 
M* = M U {±00} = [—00, 00] is the extended real line, and l{x) = 1, a; G X. In particular, 
we have that || ■ — ■ ||y : V{X) x P(X) — > [0,2] is indeed a metric on V(X). Moreover, a 
sequence {A*n}J^i ^ 'P(X) is said to converge in variation to /j, ^ P(X) if 



lin{A) ^ ii{A) 



Proof. See Portmanteau's Theorem, [13, Theorem 2.1, p. 16]. 



□ 




IV'l<i 







as n ^ 00. 
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Distance in variation can alternatively be characterized as 



o"i-o-2||y = 2 sup I (Ti (A) - 0-2(^)1, 

A£B(K) 



cTi,a2 G V(X) [14]. 



As mentioned before, convergence in variation is stronger than weak convergence. In- 
deed, we have the following result. 

Lemma 2.2. Let {/in}5?Li ^ V{X) and fi G V{X). If — fi\\v ^ as n ^ 00, then 
fin ^ fj' as n ^ 00. 

Proof. Since sup^gg(x) |yUn(^) — ^ as n | 00, we conclude fin{A) — > fi{A) for all 

A G i3(X), as n t C)0 as well, and therefore in particular for each A being a /x-continuity 
set. The lemma then follows from Lemma [2. 1[ □ 

For II G P(X) and p G [1, 00) we define 



with the standard convention 0[±oo] = 0, and the L^((i/i)-norm of / G L''^{dfi) as 



For fi G P(X) we denote as L°°{dfi) the space of all functions / G A^(X) which are 
bounded except possibly on a //-null set, and define the L°°((i/i)-norm of / G L°°{dfi) as 
usual, i.e.. 



where for g G A^(X), (yu) ess sup^gx the essential supremum of (7 w.r.t. /x, is the 
infimum of sup^gx^(^) /i ranges over all functions mapping X into R* which are equal 
to g /i-almost everywhere. Thus, for / G L°°{dfi) we have 



(Also, the same as for 1 < < p2 < C)0, / G L°°{dfi) implies / G LP{dfi) for each p G [1, 00) 





11/11 



L-(dM) = (/^)esssup^gx 



11/11 



ioo(rf^) = inf {M G R+ : {x G X : I > M} = 0} . 



and fi G P(X). In fact, (||/||lp(,^)F < ll/IU°c(d^).) 
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Remark 2.1. For any given ji G ViX), the spaces {L^^dji), \\ • \\Lv(dn)), P G [Ij oo], become 
normed linear spaces with the usual addition and scalar multiplication of functions, and in 
fact Banach spaces, provided we treat measurable functions coinciding ^-almost everywhere 
as equivalent [15]. 



This notion is useful to determine another characterization of distance in variation. Let 
(7i and (72 be two measures in AC(X). Then [14] 



pi — (72 y — 



dai 


da2 


dx 


dx 



dai 
dx 



da2 



dx 



dx. 



Also note that 



dai 
dx 



da2 
dx 



e L\dx) 



for all ai, a2 in AC{1C). Though not explicitly used in the paper, for p G (1, oo), {/x^}^]^ C 

df^n dfl ^ 

dx dx > 



^C(X) and \x e ^C(X) such that {(^ - f^)}^=i C ^{dx), since 



dx 



dfi 
dx 



< 



L'^idx) 



djlr^ 



dx 



dfi 
dx 



LP{dx) 



for each nG{l,2,...}, we have that convergence in D'{dx) of the corresponding densities, 
W^dx ~ ^\\Lp{dx) — > as n t OO, implies, the same as convergence in L^{dx), convergence 
||A*n ~ I^Wv — >■ 0, as n t oo as well. 



2.3 Entropy 

We conclude this section by writing a general definition of entropy of probability measures, 
on measure-theoretical grounds. In the sequel all logarithms are understood to be to the 
base 2. 

The space of measures 

H(X) = i^neAC{X): log 

with the convention log[0] = — oo, represents the set of well-defined entropy measures. 
Definition 2.3. The Shannon Differential Entropy, associated to the underlying space X, 



eL\dA, 
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is the mapping H. : E[(X) assigning to each /i e EI(X) the value H[jj] G M given by 



n\ji] = - / log 



dx 

H[jj] is called the Shannon Differential Entropy of /i. 



dfi. 



3 Convergence of Differential Entropy: Characteriza- 
tion in Terms of Weak Convergence, Convergence 
in Variation and the Kullback-Liebler Discriminant 



In this section we illustrate by means of a counterexample how weak convergence of prob- 
ability measures is not enough for convergence of the associated differential entropies. We 
characterize the desired differential entropy convergence for fairly general supported densi- 
ties in terms of the KuUback-Leibler discriminant, also showing that under an appropriate 
boundedness condition on the densities involved, convergence in variation of the underlying 
probability measures does indeed guarantee differential entropy convergence. 

Consider the space X = [0, 1], and define the probability measures (taken from [13]) fi 
and fjLn in ^C([0, 1]) by setting, for each x e [0, 1] and n e {1, 2, . . .}, 

and 

d/jj/ji / \ • 2-il \ \ { ^ ^ 



dx 



fc=0 

where, as customary for A C X, l{x & A} = 1 if x E A and l{x G A} = if x G X \ ^4, 
with X\74 = {a;eX:x^A}, the usual set-theoretic difference. Of course. 



IJ^IA) = [ djj, = [ -j^dx 

J A J A 



and 

dl^n, 
-dx 



l^n{A) = dl^n = 

J A J A 



I A dx 

for each A e B([0, 1]). Note ji is nothing but Lebesgue measure in [0, 1]. Also, it is easy to 
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see that, for each n e {1,2,.. .}, 



|/x„([0,x])-//([0,x])| < 



n 



1 - 



for all X G [0, 1], and therefore /i„ =^ as n | oo. On the other hand, we obviously have 
II e H([0, 1]) and {iJin}n=i C H([0, 1]). In fact, 

n[iA^- J log ^ dpi ^-\og[i] J dpi^o 



[0,1] 



[0,1] 



and, for each n e (1, 2, . . .}, 



log 



[0,1] 



dx 



— —n^ log[n^] 



dfXn - J log 
[0,1] 

dx 



dpifi 
dx 



dUn 

dx 



dx 



Un—l/k_ fc I 1 \ 
fc=0V„,„"rr3"-l 



-21og[n], 



where for the last equality above we have used the fact that Lebesguc measure of the set 
Ufc=d(t' n + ^) Hence, we have /^^ ^ as n t oo, but H[iJ,n] i -oo as n t oo, i.e., 

n[i^n]i-ooy^O = n[ii]. 

The previous counterexample shows that weak convergence of probability measures is 
not enough for convergence of the respective differential entropies. It is interesting to note 
that in the example, though /x„ =^ as n t oo, pointwise convergence of the family of 
densities to ^ fails to hold Lebesgue-almost everywhere. Indeed, as mentioned 

before, we have with A„ = [fkZlH: | + that 

for each n e {1,2,...}, and therefore 

oo oo ^ 

^/.(>l„) = ^-<oo. 

n=l n=l 

Hence, by Borel Lemma, [11, Lemma 3, p. 78], 

fj, I limsup74„ I = 0, 

\ nfoo / 
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where as usual, lim sup„|^ = n^iUm=n^m- -^^^ then, for /x-almost every x e [0,1] 
there exists rix G {1, . . . , n} such that x ^ for all n e {nx, -\- ■ ■ ■}■ Hence, we 
conclude 

as n t oo, for //-almost every x e [0, 1] as well. Thus, we have 

for //-almost every x e [0,1], i.e., pointwise convergence of {^}^i to ^ fails to hold 
Lebesgue- almost everywhere in [0, 1]. 

Instead of asking for an appropriate pointwise convergence condition, as we do in the 
next section, we now characterize the desired convergence 7i[/i„] — 7i[/i] as n f oo in terms 
of the KuUback-Liebler discriminant. Some definitions are in order before establishing the 
result. 

For jj, e ■p(X) we denote as AC{X\\n) the set of all a e P(X) that are absolutely 
continuous w.r.t. /*, i.e., having the representation 



cr[A) = [ ^dfi, 
J A dii 



A e B(K), with ^ : X — > ]R+, Borel measurable, the Radon-Nikodym derivative or density 
of (7 W.r.t. /(. Also, we set 



H(X||/t) = <^(7e^C(X||/t):log 



da 
d/j, 



e L\da)^ 



Considering a e H(X||/t) and X^ = support(^) we have 



log 



da 
d/i 



da 



log 



h 

JX9 



log 



log 



da 
dfj,_ 
da 
dii_ 

da 
d^_ 



da , 

—dii 

djj, 

da 

—dji 
dji 



da 
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and, by a standard application of Jensen's Inequality [6], 



log 



da 
dfi 



da 



log 



da\ 
dfi J 



da 



< log 
= log 



^dfi) 



da\ (i(T ^ 
dfi J dji 



= log [/i(Xp] 
<0, 

with equality if and only if ^ = 1, a- almost everywhere (recall l(x) = 1, x G X). Having 
noticed this, we make the following definition. 

Definition 3.1. The Shannon Relative Entropy, relative to ^ E 'P(X), is the mapping 
T>[-\\ix\ : E[(X||/i) M+, assigning to each a G E[(X||/i) the value X'faH/x] G M+ given by 



V[a 





da 


I log 
/x 


dfi 



da. 



V[a\\fi] is called the Shannon Relative Entropy between a and fi, or the Kullback-Liebler 
discriminant between a and /x too. 



The Kullback-Liebler discriminant does not constitute a distance between probability 
measures: it is not symmetric and does not satisfies the triangle inequality; indeed, a G 
]HI(X||/i) does not even imply /i G ^C(X||cr). It is widely used as a notion of closeness 
between probability measures though, mainly because, as shown above, ©[aH/i] > with 
equality if and only if ^ = 1, a-almost everywhere. 

Before stating the result in the next theorem, we make the following remarks. 
Remark 3.1. If a E ^C(X) and /x G ^C+(X), then a G ^C(X||/i). In fact, we can set 

da . da dx 
d/i dx dfi 

with ^ given by as we do throughout. Then, when a, fi G AC+(X) we have a G 
AC{X\\fi) and fi G ^C(X||(j), i.e., a and fi are mutually absolutely continuous or equiv- 
alent. Moreover, on the (partial) converse direction, a G AC{1C) if a E AC{li\\fi) and 
fi G AC{K), and we have ^ = Lebesgue- almost everywhere, and ^ = ^[^]"^ 

fi-almost everywhere. These facts will be used in the sequel without any further comment. 



13 



Remark 3.2. From Pinsker's Inequality (see for example [10]), for any fi G 'P(X) and 
{f^n}n=i ^ II(X||/i) we have 

for each n E {1,2, . . .} and therefore, convergence V[fin\\fi] —>■ as n ] oo implies conver- 
gence Wfin — fi\\v ^ 0, as n ^ oo as well. 

Theorem 3.1. Let {/in}^=i ^ H(X) and jj G AC{X) be such that > 0, for each 

X e X, and log[^] G Cb(X). T/ien, {/i„}^=i C e(X||/i), fi G H(X) and i/ie following 
assertions are equivalent. 

(i) /i„ ^ /i and 7-^[/i„] — > as n] oo. 
(a) VliinWl^] ^0 as n I oo. 



Proof. Since log[^] is in particular bounded on X, we obviously have fi G HI(X). In 
addition, 



log 



dfln < 



log 



dftn 

dx 



dfXn + 



log 



dfi 

dx 



dUn < oo. 



/ log 


d^n 


Jx 


dfi 


/ log 

/x 


dfJ^n 


dx _ 



d^n 
djJn - I log 



dfi 
dx 



dUr, 



and, since also fi G E[(X), from equation ([6]) we conclude 

v[finM =n[fi]-n[fin] + I log 



/ log 




d/i — 


/ log 


d/i 


/x 


dx 


'x 


dx 



dfin- 



But, since log [4^] G Cfe(X), if yU„ ^ /x as n f oo we conclude that 



log 



d/i 

dx 



d/i„ ^ / log 



dfi 

dx 



djjL, 



(5) 



since also C EI(X). In particular, {iXn}'^=i ^ EI(X||/i). Now, from equation ([5]) we 

may write 



(6) 



(7) 



as 72 I oo as well, equation ([7]) proving then the implication (i) ^ (ii). The converse 
implication (ii) ^ (i) also follows from equation ([7]), in view of Remark 13.21 and Lemma 
The theorem is then proved. □ 
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Remark 3.3. Consider fi G ^C(X) with ^^{x) > for each x G X. Then, the set X does 
not necessarily need to be bounded for log[^] to be bounded on X. Indeed, consider for 
instance the uniform distribution on any unbounded set X ('C MJ', k > 1) having finite and 
strictly positive Lebesgue measure, as for example in 

X = {x = {xi,X2) e Rl : e-^^'i > Xa} 

with A G (0, oo). X so defined is an unbounded subset o/M^. However, since the Lebesgue 
measure of X is J^dx = X^^ G (0, oo), the uniform distribution on X satisfies, for all 
X G X, 



log 



dx 



\x 



log 



dx 



log[A], 



trivially bounded on X. In the same way, the set X does not necessarily need to be bounded 
/or log [^] to be an element of L°°{dx). 

Remark 3.4. Since X is closed, we have log[^] G Cb(X) whenever X. is in addition bounded 
and fi G AC{X) with g(x) > 0, for each x eX, and^ e C(X). Indeed, z/X C is closed 
and bounded it is then compact, and therefore for fi G ^C(X) with ^(x) > 0, for each 
X G X, and ^ G C(X), there exist m and M in (0, oo), m < M, such that ^(x) G [m, M], 
for each x G X as well. Thus, log[^] G Cb(X). y4/so, note that for the purpose of Theorem 
\3.1\ we can always take X as being bounded for {/i„}^^ C ACiRj') and fi G AC{R'') when 

oo 
n=l 

and K is bounded, where K = support(^) and Kn = support(^) for each n G {1, 2, . . .} 
(all the supports being taken w.r.t. M.^). Indeed, with H = K we then have ^ > on X 
and each fin, the same as fi, is concentrated on X, i.e., /in(X) = 1. 

Remark 3.5. The probability measures considered in the counterexample at the beginning 
of this section satisfies all hypotheses of Theorem \3.1\ and, in addition, fin ^ as n ^ 
oo. However, similar to the differential entropy convergence failure, we have T>[fin\\fA — 
21og[r2] t oo as n 1 oo, i.e., the convergence V[fin\\fj] ^ as n f oo fails to hold. Also, 

sup \fin{A) - fi{A)\ > \fin{An) - /U(A„)| = 1 - 

for each n E {1, 2, . . .}, and hence the convergence H/i^ — /iHy — > as n | oo fails to hold 
too. In light of Theorem \3.1l we have failure of differential entropy convergence due to 
failure of the corresponding convergence for the Kullback-Liebler discriminant, due in turn 
and in light of Remark \3.2\ to the respective failure of convergence in variation. 
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Though weak convergence does not guarantee convergence of differential entropy, the 
stronger convergence in variation does it indeed under an appropriate boundedness condi- 
tion on the densities involved. The result is the following. 



Theorem 3.2. Let {fin}n=i ^ AC+{X) and fi e AC+{X) be such that log[g] G L°°{dx) 
and {log[^]}^=i C L^{dx). Assume that 

dUn 



M = sup 

ne{l,2,...} 



log 



dx 



< oo. 



Then, {^in}n=i ^ nH(X||/i), e M(X) and, if - 

both 

V[iJin\\iJi] and H[iin] ^ ^M, 



(8) 



Q as n ] oo, we have 



as n t oo as well. 



Proof. First, since both 



and 



log 



dfi 



log 



dx 
dx 



e L'^idx) 



C L'^idx), 



n=l 



we have n G M{X) and {nn}^^i C EI(X). Therefore, we may write 



log 



dfJ^n 

dx 



dUn - / log 



dii 

dx 



d/i 



( lo 


d^n 


da , 
- — log 


dfi 


^ dx 


\ dx 


dx 


dx 


dx 



< 



d^n 1 

, log 


dUn 


da , 
- — log 


dfi 




dx 


. . 


dx 


dx _ 





dx. 



(9) 



Now, for each n e {1,2,.. .} we have 



[x) log 




dx 


. dx 



dfi ^ 
dx 



x) log 



dfi ^ 
dx 



< 



lo£ 



dx 



+ f{x) 
dx 



log 



dx dx 



dfXn 

dfi 



[X 



(10) 



for Lebesgue-almost every x e X. For each n e {1, 2, . . .} we also have, for Lebesgue-almost 
every x e X as well. 



log 



dii 



(X 



< 



log [M'] 
M' - 1 



dfi 



(11) 
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since 



dx 



Y M+ll i^ll I 
\^ ' II da: II LOO (da;) J 



M' G (0, 1) 



for each n G {1, 2, . . .} and Lebesgue- almost every x G X too, and 

11 r II ^ logN I 
llogHI < |a- 1| 

ao — i 

for all a G [oq, oo), with ao ^ (0, !)• We also have 



dur. 



dfi 



x) - 1 



dfi ^ 
dx 



dfJ'n / \ dfi . 



[12) 



for each n G {1,2,...} and Lebesgue-almost every x G X. Hence, from equations ( ITOi) . ( ITT 
and (fT2l) we conclude 



dUn 1 \ 1 

(x) log 


A 


ax 


ax 



dfx 

— (x) log 

ax 



dx 



[X 



< 



M + 



log [M^ 
M' - 1 



ax ax 



(13) 



for each n G {1,2,...} and Lebesgue-almost every x G X as well, and therefore, from 
equation ([9]), 



In the same way, since 

dfi. 



M + 



log m 

M' - 1 



d^n dfi 
dx dx 



(14) 



log 



dfi 



dfin<M + 



log 



dfi 

dx 



< OO, 



and therefore {fin}'^=i C EI(X||/i), from equations ( ITTl) and ( |T2l) it is easy to see that 

dfin dfi 



II 1 . log[M2_ 



dx dx 



(15) 



L^dx) 



The last part of the theorem then follows from equations f|T4l) and f|T5|) since, if ||yU„— v 
as n I oo, then 

rl II rl II 

(= Wfin - /U||y) ^ 0, 



dfin 


dfi 


dx 


dx 



L^(dx) 



as n I OO as well. 



□ 
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Remark 3.6. The reader can verify that the arguments leading to the proof of Theorem 
require for the supports of the densities ^ and n G {1,2,...}, when regarded 
as densities in M'^, to at most pairwise differ by a Lebesgue-null set. The set X in the 
statement of the theorem can then be taken as the intersection of all the afore mentioned 
supports. Indeed, for such a fi E AC{M.'') and {fin}'^=i ^ AC(R'') we have, with /iq = /i, 
Ko = support (^), Kn = support (^) for each n G {1,2, ...} (all the supports being taken 
w.r.t. W^) and 

oo 

X = fl 



n=0 



that for each m G {0, 1,2,.. .} 



n=0 



a Lebesgue-null set, and therefore, since {/i„}^o — -^C(M^), that each element in the 
sequence {/injj^o ^■^ concentrated on X. Moreover, {;U„}^q C ^C+(X). 

We have the following corollary to Theorems 13.11 and 13. 2[ 

Corollary 3.1. Let {Hn}n=i ^ AC+{X) and fi G AC{X) be such that > 0, for each 

X G X, log[g] G Cb{X) and {log[^]}^=i C L°°{dx). Assume that 



sup 

ne{l,2,...} 



log 



dx 



< oo. 



L°°{dx) 



Then, {^ri\'^=i ^ 1H[(X) fl H(X||/i), /i G EI(X) and the following assertions are equivalent. 

(i) iJin=> jJ' and Hlfin] T^lfA as n ^ oo. 
(a) V[fin\\fi] ^ as oo. 
(Hi) Wfin — —>■ as n t oo. 



Proof. The result follows from Theorems 13.11 and 13.21 in view of Remark 13.21 and Lemma 

□ 
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4 Pointwise Convergence and Differential Entropy Con 
vergence 



In this section we provide a general result for convergence of Shannon Differential Entropy, 
and Kullback-Liebler discriminant as well, under an appropriate pointwise convergence 
condition. We take into account both compactly and uncomplactly supported densities. 
As mentioned in Section [H the proof is based on exploiting absolute continuity properties 
of measures, in conjunction with a suitable boundedness condition and the dominated 
convergence theorem. The result is the following. 



Theorem 4.1. Let ^ e H(X) and {/i„}^=i C AC{X\\^) he such that ^{x) 
for ^-almost every a; G X, and ^ L°°{dj2). Assume that 



1 as n I oo, 



M 



sup 

nG{l,2,...} 



dfin 



dfi 



< OO. 



(16) 



L°°{dfi) 



Then, {fin}n=i ^ ^ II(X||/i) and we have both 

V[fin\\fi] and H[fin] ^[/i] 



as n t oo. 



Proof. First, for each ?T,G{l,2,...}we have 



log 



dfi 

dx 



dfln 



'X 

< M 

< oo 



log 



dfb 

dx 

log 



dfi 



dfi 



dfi 
dx 



dfi 



{fi e EI(X)). Condition in the statement of the theorem also implies that log[^]}^ 
L°°{dfi) with 



M' = sup 

nG{l,2,...} 



dfin 



dfi 



log 



dfin 

dfi 



< oo, 



(17) 



and therefore, {fLn\'^=i C EI(X||yu). Indeed, for each n G {1, 2, . . .}, 



log 



dfin 

dfi 



dfin 



log 



dfin 

dfi 



dfin 

dfi 



dfi< M'dfi, 
Jx 



:i8) 
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and M'dfi = M'/i(X) = M' < oo. Hence, for each n G {1, 2, . . .} we also have 



log 



dx 



dfXn < 

< OO 



log 



dUn 

djj, 



d^n + 



log 



dn 

dx 



dun 



thus {fin}'^=i C EI(X), and we may write 





/ log 

Jx 


dfin 

dx 


djJn - ^ 


/ log 

X 


dfi 

dx 


dfi 


< 


/ log 


dfin 

dx 


dUn - ^ 


/ log 

X 


dfi 

dx 


dun 



log 



I.e., 



for each n G {1, 2, . . .} as well. But, 

df^n 

dfi 



dfi 
dx 



log 



dfin - 

dfi 



log 



dfi 
dx 



dx 



dfin - / lo, 



^[/^nll/^] = / log 

Jx 



dfir, 



log 



dfin 

dfi 



dfin 

dfi 



dfi 

dfi 

dx 

dfi 



dfi 



(19) 



(20) 



for each n G {1,2,.. .} and, as already used in equation (fT8|) . from (fT7j) it follows that, for 
each n G {1, 2, . . .} too. 



dfin 
dfi 



log 



dfin 

dfi 



< M' 



for /x-almost every x G X. Since also {^log[^]}5^i converges pointwise //-almost every- 
where to on X as n t oo, where 0(x) = 0, x G X, by Lebesgue's Dominated Convergence 
Theorem (see for example [15]) we conclude 



/ log 


dfin 


Jx 


dfi 



dfin 

dfi 



dfi — i> 0, 



(21) 



as n I oo as well. The claimed convergence ©[/XnH/i] —* as n ^ oo then follows from 
equations (120|) and (12T|) . Now, to establish the remaining claimed convergence H[fin] 
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'H[li\ as n I oo, we note that for each n G {1, 2, . . .} we also have 



log 



djji 
dx 



dfin - / log 
log 



< 



log 



dfj, 

dx 
dfi 
dx 
dfi 
dx 



dfi 
dfi 



dfi 
dfi 



dfi — lo 



dfi 

dx 



dfi 



dfi 



- 1 



dfi 



(22) 



(recall l{x) = 1, a; G X). But, since {^^}^]^ converges pointwise /i-almost everywhere to 
1 on X as n t oo, we conclude 



log 



dfi 

dx 



dfifi 
dfi 



0, 



/i-almost everywhere on X and as n j oo as well. In addition, since we obviously also have 
{|f - l}5f=i ^ L'^idf^) and 

dfin 



M" = sup 

ne{l,2,...} 

< sup 

ne{l,2,...} 

= M + 1 

< oo, 

we conclude that, for each ri G {1,2,...}, 



dfi 

dfin 



dfi 



L°°((i/x) 
+ 1 



L°°{d^l) 



log 



dfi ^ 

dx 



dfin ^ 

dfi 



x) - 1 



< M" 



for /i-almost every x G X. But, 



M" 



log 



dfi 

dx 



dfi = M" I 
Jx 



log 



log 



dfi 

dx 



dfi . . 
dx 



dfi < oo 



{fi G H(X)). Hence, once again by Lebesgue's Dominated Convergence Theorem we con- 
clude 



log 



dfi 
dx 



dfin 

dfi 



dfi ^ 



as n I oo, and therefore from equation fl22]) we also have 



log 



dfi 

dx 



dfin / log 



dfi 

dx 



dfi, 



(23) 
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as n I oo as well. The claimed convergence 7i[/in] — > ?i[/u] as n f oo now follows from 
equations (HM . (120]) . (^11) and (1251) . proving the theorem. □ 



Remark 4.1. //{/in}^i ^ ^C(X) and /i G ^C+(X), t/ien ^-almost everywhere pointwise 
convergence 1 as n ] oo is equivalent to Lehesgue- almost everywhere pointwise 

convergence as n 1 oo as well (both on X, of course). 

Remark 4.2. // {iJ,n}n=i ^ AC{X} and /j, e AC{X} with {^}^^^ converging pointwise 
Lehesgue- almost everywhere to ^ on^ as n ] oo, then — — s> 0, as n ] oo as well. 
Indeed, 

rill.. dii r dii.. dii 

dx 



dUn 


dfi 


^1 


dfJ'n 


dfi 


dx 


dx 


L^(dx) 


dx 


dx 



as n t oo, the convergence following from Schefje's Lemma, [16, Lemma 5.10, p. 55]. 
Therefore, when {jjin}^=i C AC^{%) and /i G ^C+(X), by going from convergence in 
variation in Theorem \3.2l to pointwise convergence of the corresponding densities in The- 
4.1 (see Remark ^. j| above), we are able to relax the corresponding boundedness 
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condition from ^ to ( fi^) . Indeed, for {fin}^=i ^ ^C+(X) and fi G ^C+(X) satisfying 
log[g] G L'^idx) and {log[^]}^^, C L-(dx) with sup„g{i,2,...} II log[^]|U-(dx) < 00, we 
have {^}^=i C L°°{dfi) and 



sup 

ne{l,2,...} 



d^r. 



djj, 



2-Pne{i,2,...}||l°gl^JIL, 



< 



L°°{d^i) 



I 1 1 

I dx \ \L°°(dx) 



condition ^ implying then (fl^). 



Remark 4.3. For any C AC{X) and jj G ^C+(X), condition fT^) in Theorem \4.1 

reads as 



dx 



[X 



djj ^ 
dx 



.X 



< M < 00 



(24) 



for each n E {1, 2, . . .} and Lebesgue- almost every x G X, and therefore, as the reader can 
easily verify (note M > 1 necessarily), we have 



dfJ^n I 

dx 



x) log 



dfifi 
dx 



[X 



< M max '?/'2(a^)} 



for each n G {1,2,...} and Lebesgue- almost every x G X as well, where 

dfi 



dx 



[x)\og [M] 



and 



^2(3;) = ^i^) log [M] + ^(^) log 



dfi ^ 
dx 
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Thus, since also y\og[y] > (— eln[2])~^ for all y G M+ (recall 01og[0] = 0[— oo] = by 
convention), condition then implies the existence of Co,Ci,C2 G M^, with Cq > 
necessarily z/ X has infinite Lebesgue measure (easy to check), such that for each n G 
{1,2,...} 



dfJi-n 



dx 



x) log 



dx 



X 



< fix), 



(25) 



for Lebesgue- almost every a; G X, where 



m=Co + C,^{x) + C,^{x) 



log 



d/i ^ 

dx 



However, even with {^in]n=i ^ e e(X) n AC+{X) and {^}n=i converging point- 

wise Lebesgue-almost everywhere to ^ on X as n 1 oo (see Remark \4-l\ ), condition / f^) 
cannot be used in the dominated convergence theorem to conclude the convergence 



log 
log 
log 
log 



dx 

dfin 

dx 
djji 
dx 
dfi 
dx 



dun 

d^n 

dx 
dfi 



dx 



dx 
dfi 



dx 



as n t oo, ifH has infinite Lebesgue measure. Indeed, 



fdx > / Codx = Cq / dx = oo 



in that case. Therefore the advantage of considering integrals w.r.t. dfi (instead of dx) in 
the arguments leading to the proof of Theorem \4.1\ 



5 Discrete Alphabet Sources 

In this section we consider discrete alphabet sources. We show how all convergence results 
become straightforward for finitely supported probability measures, and we also provide 
results for the infinitely supported case, by exploiting the equivalence between weak con- 
vergence and convergence in variation in this setting. 



23 



Though most of the definitions in the previous sections include the discrete Ccise cis 
a particular case when no reference to AC{lCj is made, by considering then discretely 
supported probability measures, for sake of preciseness we briefiy go through all the relevant 
concepts before stating the results. 

Throughout this section we consider, specifically, 

X = {xiji^x C with J C {1, 2, . . .} 

and, as before, M*^ the fc-dimensional Euclidian space. (Note that X is allowed to be the 
whole of {1,2,...}.) Accordingly, S(X) denotes the collection of all subsets of X and 
P(X) the collection of all probability measures on (X, iS(X)). A measure fi G P(X) is now 
characterized by the sequence {Pi}iei ^ [0, 1], satisfying the normalization condition 

iex 

given by = i G X. To any sequence {ajjigi C R we associate the mapping 

a : X ^ M by setting a{xi) = ai for each i We shall use the same notation as in the 
previous sections to denote now (recall the conventions log[0] = — oo and 0[±oo] = 0) 

e(x) = {^^e P(x) : {log bf]},,,^ g /^(/i)} , 

where 



for /i G V{X). Note that {ajjigj G /^(/i) if and only if a G L^{dfi). In fact. 



i€X J 



iei 

and 



PllLi(d/x) 



for {ai}i(zx G or, equivalently, for a G L^{dfi). 

Remark 5.1. For any given fi G 'P(X), Banach spaces (l^ifJ^), \\ ■ \\ip{ij,)) can be considered 
for each p G [1, oo], similarly than in Section\E, provided sequences coinciding at each z G X 
for which p^ > (i.e., ^-almost everywhere) are treated as equivalent. 
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Consider the measure /i e H[(X) . Then, 



is the Shannon Entropy of fi. Also, given fi G 'P(X), ^C(X||/x) denotes the set of ah 
probabihty measures a G V(K) that are absolutely continuous w.r.t. /i, i.e., satisfying the 
condition — whenever pf — 0. In the same way. 



H(X||//) = la eAC{X\\fj,) : llog 



with the standard convention 01og[^] = (motivated by continuity). Finally, if cr G 
EI(X||//), then 



is the Shannon Relative Entropy between a and /x, or equivalently the KuUback-Liebler 
Discriminant between a and /i too. It is worthy to emphasize that 



> 



for any fj, G P(X) and a G ^C(X||/i), with equality if and only if p'* = p'^ cr-almost 
everywhere, i.e., if and only if = for each i el such that pf > 0. 

Weak convergence of {/in}^i ^ P(X) to G P(X) is now characterized as follows. We 
have /x„ =^ /X as n t oo if and only if 



(26) 



as n t oo as well, for each bounded, real-valued function / on X. 
Distance in variation between ai G P(X) and (T2 G P(X) is 



Wi — cr2\\v 



Pi^ Pi'^}i&j\\li(5) ' 



where S denotes the counting measure on (X, 5(X)). i.e., 5({xj}) = 1 for each i El and 
^{^) = T^x^^A^ii^i}) for each A G 5(X). (As before, note {p^^ - pT}ieJ e /^(5) for 
all (7i,(72 G P(X).) The corresponding convergence in variation of C P(X) to 
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fi G 'P(X), Wfin — /i|| y ^ as n I oo, takes place if and only if 
as n t OO as well, since 

\M"-pn.ei\\i^s) = J2\p""-p^\- 

In the discrete setting, the relationship between weak convergence and convergence in 
variation in Lemma [2.21 can be strengthened, as stated in the following result. 

Lemma 5.1. Let {fin}'^=i ^ 'P(X) and fi G 'P(X). Then, we have fin ^ fJ' (^s n ^ 
oo if and only if — fi\\v 0, as n ^ oo as well. Moreover, the previous ways of 
convergence take place if and only if p^" —> as n f oo for each i E I, i.e., both the 
topology of weak convergence and convergence in variation are equivalent to the topology of 
coordinatewise convergence of the sequence of vectors {(pf")jGx}5^i the vector (pf)jGX as 
n I oo (equivalently, to the topology of pointwise convergence of {p^"}^]^ to p^ on X as 
n] oo). 

Proof. We obviously have that /x„ ^ /i as n | oo implies pf" — , as n t oo as well, for 
each i E X. Indeed, we just need to consider equation (l26l) with /j : X — > {0, 1} defined, 
for each i G X, by letting fi{x) = 1 if x = Xj and = if x G X \ {xj}. Now, since 

— IIP P \\L^{d5) 

= I Ipf"" -p^'ldS, 

if the sequence {p'^"}^^ converges pointwise to p'^ on X as n t oo, then Scheffe's Lemma 
gives us the convergence — ^ 0, as n t oo too, the same as in the differential case 
(see Remark I4.2p . The lemma then follows from Lemma I2.2[ □ 

Remark 5.2. In the differential setting and from Remark \4.2\ and Lemma \2.Sl we have 
the chain of implications: pointwise convergence of densities (Lebesgue-almost everywhere 
pointwise convergence in fact) =^ convergence in variation weak convergence. As Lemma 
I5.il shows, the corresponding three ways of convergence in the discrete setting are indeed 
equivalent. 
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In view of Lemma 15.11 it is a straightforward exercise to check that in the case when 
the set X (equivalently the index set X) can be taken to be finite (i.e., when the supports 
of all probability measures involved are contained in a finite set), the convergence /i„ ^ /i 
as n I cxD implies both 

HlfXn] T-i-lfi] and VlfinWfj] 0, 

as n I oo as well, being in fact, from Remark 13. 2[ /i„ ^ /i as n | oo and ©[/inll^u] — as 
n t oo equivalent (of course with {^n}'^=i ^ AC{X\\^) for the convergence ©[^UnHyu] ^ as 
n I oo). 

Given fi G ^(X) and {yUnj^i ^ 'P(X), the set X can be made into a finite set whenever 
/i is finitely supported and {yUnjJ^i ^ '^C(Xllyu), by just redefining it as X^ with 

X^ = support (p^) = {xeX: p^{x) > 0} = {x.jiej^ 

and = {z G X : > 0}. Note that if yU G P(X) is finitely supported and {^n\'^=i ^ 
^C(X||/i), then yU G HI(X) and {iin\'^=i ^ ]HI(X||yu). The discrete setting versions of The- 
orems 13.11 and 13.21 and Corollary 13.11 are trivial in that case. They cannot be stated for 
/i G P(X) being infinitely supported however, as clear from the following remark. 

Remark 5.3. Unlike in the differential setting (see Remark lS. 3\) . in the discrete setting we 
have for fi G P(X) that as i ^ oo, i G X^, whenever (equivalently H^) is infinite 

(^i&i^Pi = ^ < oo), and therefore the subsequence {log[pf]}jgi-^ cannot be bounded in that 
case (even when X^ is a bounded subset ofM.^). 

We consider the general case, covering infinitely supported probability measures, in 
the following theorem (which corresponds to the discrete setting version of Theorem 14.11) 
and two corresponding corollaries. Though the proof of the theorem follows by similar 
corresponding arguments as those in the proof of Theorem 14. H we include here the main 
steps in order to make clear the connection between both settings. 

Theorem 5.1. Let /i G H(X) and {fin}n=i ^ AC(K\\n) be such that pf" as n 1 oo, 

for each i G I^>o, and 

M = sup -\- < oo. 

n€{l,2,...} Pi 

Then, {/in}^=i ^ H(X) n H(X| and we have both 
as n ^ oo. 
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Proof. As in the proof of Theorem 14.11 it is easy to see that {/i„}5^^ C 
and that 



iei 



(X) nH(x||/x), 

(27) 



for each n G {1, 2, . . .}. But, for each n G {1, 2, . . .} as well, 



J^pf" log 



Pi 



Pi 



Pi 



({/^n}^i ^ ^C(X||/i) and 01og[^] = by convention), and 



Pi 



pf; 
.pf 



Pi 



x„ P'^ 



(i/i. 



Also, there exists M' G M+ such that 

p''"(x) 
p''(a;) 



for each a; G X^ and n G {1, 2, . . .}, and 



log 


p^"(x) 






p^(a;) 





< M', 



Then, since p^"" pointwise on X„ as n | oc, and therefore 



log 







(recall 0(a;) = 0, a; G X), pointwise on X^ and as n | oo as well, by Lebesgue's Dominated 
Convergence Theorem we conclude the convergence 



(i/i — i> 



as n t oo, and thus the claimed convergence ^ as n | oo too. Now, note for 
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each n E {1,2, . . .} we also have 



E b'] (p'" - p") 



< 



log [Pf] ( |r 



- 1 



and 



log [Pf] ( ^ - 1 



(recall l(x) = 1, x G X). But, 



Pi 



log[p1(^-l 



< M"(-log[p^(a;)]) 



for each x G and n G {1, 2, . . .}, with M" = M + 1 G M+, and 



M"(-log [p'^])rf/i = M" 



i6X 



= M"7i:[/i] 

< CXD. 



In addition, by the same arguments as before. 



log[p1(^-l 







pointwise on X^ as n | oo. Hence, once again by Lebesgue's Dominated Convergence 
Theorem we conclude 



as n t oo, and therefore 



(29) 



as n I oo too. The remaining claimed convergence Tilun] Tilfi] a.s n ^ oo then follows 
from the convergence as n | oo and equations (!27|) and (!29l) . proving the 

theorem. □ 
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We have the following two corollaries to Theorem I5.1[ For the first, let us define 
[e n P+](X) = e(X) n V+{X) with V+{X) the collection of all fi G V(X) satisfying > 
for each i El. 

Corollary 5.1. Let /x G [H fl P+](X) and {^n}'^=i ^ V{X) he such that /i„ ^ as n | cx) 
and 

sup —n- < OO. 
ng{l,2,...} Pi 

Then, {fin}n=i ^ H(X) n H(X| and we have both 

V[^ri\\fA and H[fin\ 

as n ^ OO. 



Proof. The result follows from Theorem 15.11 in view of Lemma 15.11 □ 
Corollary 5.2. Let /i G H(X) and C AC{X\\ij,) be such that 

sup < OO. (30) 

nG{l,2,...} Pi 

r/ien, {/Xn}^i ^ ]HI(X) n H(X||/x) and, if V[fj,n\\fj] ^ as n ] oo, we have 



as n ^ OO as well. 



Proof. The result follows from Theorem 15.11 in view of Remark 13.21 and Lemma 15.11 □ 

Remark 5.4. In the context of continuity versus pure convergence properties of Shannon 
entropy discussed in SectionUl note Corollaru \5 . 2\ establishes the convergence 7i[fin] 'H[fj] 
as n t OO, under the convergence ^ as n | oo as well, by exploiting an underlying 

structure relating {/injj^i to fi (condition ( f^) ). In contrast, by imposing the stronger 
requirement on fi of being power dominated (stronger than just fi G EI(X); see [10] for 
the definition of a power dominated distribution), the continuity result [10, Theorem 21, 
p. 16] establishes the corresponding entropy convergence, in a discrete setting too, for all 
approximating sequences converging in the above Kullback-Liebler discriminant sense. 
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6 Conclusion 



Results on convergence of Shannon entropy have been estabhshed for both the differen- 
tial and discrete settings. In the differential case, it was shown that weak convergence of 
the underlying probability measures is not enough for convergence of the associated dif- 
ferential entropies. Differential entropy convergence was then established for fairly general 
supported densities in terms of the KuUback-Liebler discriminant, and it was also shown 
that under an appropriate boundedness condition, the stronger convergence in variation 
of the underlying probability measures does indeed guarantee the desired differential en- 
tropy convergence. A general result for differential entropy convergence was also provided 
in terms of a pointwise convergence condition, accounting for compactly and uncompactly 
supported densities. In the discrete case, it was shown that convergence in distribution 
and in variation of probability measures become equivalent, trivially guaranteeing all in- 
formation measures convergence in the finitely supported case. Results on entropy and 
Kullback-Licblcr discriminant convergence were also established in this setting for possibly 
infinite supported probability measures. 

Wc believe the results here exposed will find a wide scope of applicability, specially 
in light of the great generality allowed for the support sets of the probability measures 
involved. 
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